Improving the Template Generation for Chinese Character Error Detection with Confusion Sets
نویسندگان
چکیده
In this paper, we propose a system that automatically generates templates for detecting Chinese character errors. We first collect the confusion sets for each high-frequency Chinese character. Error types include pronunciation-related errors and radical-related errors. With the help of the confusion sets, our system generates possible error patterns in context, which will be used as detection templates. Combined with a word segmentation module, our system generates more accurate templates. The experimental results show the precision of performance approaches 95%. Such a system should not only help teachers grade and check student essays, but also effectively help students learn how to write.
منابع مشابه
Reducing the False Alarm Rate of Chinese Character Error Detection and Correction
The main drawback of previous Chinese character error detection systems is the high false alarm rate. To solve this problem, we propose a system that combines a statistic method and template matching to detect Chinese character errors. Error types include pronunciationrelated errors and form-related errors. Possible errors of a character can be collected to form a confusion set. Our system auto...
متن کاملNTOU Chinese Spelling Check System in CLP Bake-off 2014
This paper describes details of NTOU Chinese spelling check system participating in CLP2014 Bakeoff. Confusion sets were expanded by using two language resources, Shuowen and Four-Corner codes. A new method to find spelling errors in legal multi-character words was proposed. Comparison of sentence generation probabilities is the main information for error detection and correction. A rulebased c...
متن کامل中文混淆字集應用於別字偵錯模板自動產生 (Chinese Confusion Word Set for Automatic Generation of Spelling Error Detecting Template) [In Chinese]
In this research, we proposed a system that can use automatically generated templates for detecting Chinese spelling error. At first, we use frequently used Chinese characters to produce the Chinese confusion set. Based on a dictionary, our system automatically generated negative vocabulary template with the help of Chinese confusion set. Error types include pronunciation-related errors and rad...
متن کاملContextual post-processing based on the confusion matrix in offline handwritten Chinese script recognition
The inclusion of potentially correct characters in candidate sets is key to improving accuracy in the recognition of Chinese scripts in the aspect of contextual post-processing. This paper presents two methods based on a confusion matrix to recall the correct characters. The first method uses original candidates to conjecture the most likely correct characters, and then combines the conjectured...
متن کاملPhonetic confusion analysis and robust phone set generation for Shanghai-accented Mandarin speech recognition
In this paper, accent issues are discussed for Shanghai-accented Mandarin speech recognition. The phonetic confusion is analyzed in detail based on the alignment between the surface form and the baseform transcriptions. Mutual information is used as the measure to extract the most confusing phoneme pairs. It was found that each phoneme in one pair can be easily misrecognized with the other. To ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IJCLCLP
دوره 15 شماره
صفحات -
تاریخ انتشار 2010